Fast Learning in an Actor-Critic Architecture with Reward and Punishment

نویسندگان

Christian Balkenius

Stefan Winberg

چکیده

A reinforcement architecture is introduced that consists of three complementary learning systems with different generalization abilities. The ACTOR learns state-action associations, the CRITIC learns a goal-gradient, and the PUNISH system learns what actions to avoid. The architecture is compared to the standard actor-crititc and Q-learning models on a number of maze learning tasks. The novel architecture is shown to be superior on all the test mazes. Moreover, it shows how it is possible to combine several learning systems with different properties in a coherent reinforcement learning framework.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing three Critic Models of Reinforcement Learning in the Basal Ganglia Connected to a Detailed Actor in a S-R Task

Actor-Critic architectures of reinforcement learning were found to show a strong resemblance with known anatomy and function of a part of the vertebrate's brain: the basal ganglia. Based on this analogy, a large number of Actor-Critic models were simulated to reproduce behaviours of rats performing laboratory tasks. However, most of these models were tested in different tasks and it is often di...

متن کامل

Humanoids Learning to Walk: A Natural CPG-Actor-Critic Architecture

The identification of learning mechanisms for locomotion has been the subject of much research for some time but many challenges remain. Dynamic systems theory (DST) offers a novel approach to humanoid learning through environmental interaction. Reinforcement learning (RL) has offered a promising method to adaptively link the dynamic system to the environment it interacts with via a reward-base...

متن کامل

REINFORCEMENT LEARNING AND OPTIMAL CONTROL METHODS FOR UNCERTAIN NONLINEAR SYSTEMS By SHUBHENDU BHASIN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy REINFORCEMENT LEARNING AND OPTIMAL CONTROL METHODS FOR UNCERTAIN NONLINEAR SYSTEMS By Shubhendu Bhasin August 2011 Chair: Warren E. Dixon Major: Mechanical Engineering Notions of optimal behavior expressed in natural systems led research...

متن کامل

Impulse control disorders in Parkinson's disease are associated with dysfunction in stimulus valuation but not action valuation.

A substantial subset of Parkinson's disease (PD) patients suffers from impulse control disorders (ICDs), which are side effects of dopaminergic medication. Dopamine plays a key role in reinforcement learning processes. One class of reinforcement learning models, known as the actor-critic model, suggests that two components are involved in these reinforcement learning processes: a critic, which ...

متن کامل

Totally Model-Free Reinforcement Learning by Actor-Critic Elman Networks in Non-Markovian Domains

In this paper we describe how an actor critic rein forcement learning agent in a non Markovian domain nds an optimal sequence of actions in a totally model free fashion that is the agent neither learns transitional probabilities and associated rewards nor by how much the state space should be augmented so that the Markov prop erty holds In particular we employ an Elman type re current neural ne...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Fast Learning in an Actor-Critic Architecture with Reward and Punishment

نویسندگان

چکیده

منابع مشابه

Comparing three Critic Models of Reinforcement Learning in the Basal Ganglia Connected to a Detailed Actor in a S-R Task

Humanoids Learning to Walk: A Natural CPG-Actor-Critic Architecture

REINFORCEMENT LEARNING AND OPTIMAL CONTROL METHODS FOR UNCERTAIN NONLINEAR SYSTEMS By SHUBHENDU BHASIN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

Impulse control disorders in Parkinson's disease are associated with dysfunction in stimulus valuation but not action valuation.

Totally Model-Free Reinforcement Learning by Actor-Critic Elman Networks in Non-Markovian Domains

عنوان ژورنال:

اشتراک گذاری